915 research outputs found

    Compositional Timing-Aware Semantics for Synchronous Programming

    Get PDF

    Automatic matching of legacy code to heterogeneous APIs: An idiomatic approach

    Get PDF
    Heterogeneous accelerators often disappoint. They provide the prospect of great performance, but only deliver it when using vendor specific optimized libraries or domain specific languages. This requires considerable legacy code modifications, hindering the adoption of heterogeneous computing. This paper develops a novel approach to automatically detect opportunities for accelerator exploitation. We focus on calculations that are well supported by established APIs: sparse and dense linear algebra, stencil codes and generalized reductions and histograms. We call them idioms and use a custom constraint-based Idiom Description Language (IDL) to discover them within user code. Detected idioms are then mapped to BLAS libraries, cuSPARSE and clSPARSE and two DSLs: Halide and Lift. We implemented the approach in LLVM and evaluated it on the NAS and Parboil sequential C/C++ benchmarks, where we detect 60 idiom instances. In those cases where idioms are a significant part of the sequential execution time, we generate code that achieves 1.26× to over 20× speedup on integrated and external GPUs

    A Novel WCET semantics of Synchronous Programs

    Get PDF

    High-level synthesis of functional patterns with Lift

    Get PDF

    SimBench: A Portable Benchmarking Methodology for Full-System Simulators

    Get PDF
    We acknowledge funding by the EPSRC grant PAMELA EP/K008730/1.Full-system simulators are increasingly finding their way into the consumer space for the purposes of backwards compatibility and hardware emulation (e.g. for games consoles). For such compute-intensive applications simulation performance is paramount. In this paper we argue that existing benchmarksuites such as SPEC CPU2006, originally designed for architecture and compiler performance evaluation, are not well suited for the identification of performance bottlenecks in full-system simulators. While their large, complex workloads provide an indication as to the performance of the simulator on ‘real-world’ workloads, this does not give any indication of why a particular simulator might run an application faster or slower than another. In this paper we present SimBench, an extensive suite of targeted micro-benchmarks designed to run bare-metal on a fullsystem simulator. SimBench exercises dynamic binary translation (DBT) performance, interrupt and exception handling, memoryaccess performance, I/O and other performance-sensitive areas. SimBench is cross-platform benchmarking framework and can be retargeted to new architectures with minimal effort. For several simulators, including QEMU, Gem5 and SimIt-ARM, and targeting ARM and Intel x86 architectures, we demonstrate that SimBench is capable of accurately pinpointing and explaining real-world performance anomalies, which are largely obfuscated by existing application-oriented benchmarks.Postprin

    Optimal and fast throughput evaluation of CSDF

    Get PDF
    International audienceThe Synchronous Dataow Graph (SDFG) and Cyclo-Static Dataow Graph (CSDFG) are two well-known models, used practically by industry for many years, and for which there is a large number of analysis techniques. Yet, basic problems such as the throughput computation or the liveness evaluation are not well solved, and their complexity is still unknown. In this paper, we propose K-Iter, an iterative algorithm based on K-periodic scheduling to compute the throughput of a CSDFG. By using this technique, we are able to compute in less than a minute the throughput of industry applications for which no result was available before

    Fast and Efficient Dataflow Graph Generation

    Get PDF
    International audienceDataflow modeling is a highly regarded method for the design of embedded systems. Measuring the performance of the associated analysis and compilation tools requires an efficient dataflow graph generator. This paper presents a new graph generator for Phased Computation Graphs (PCG), which augment Cyclo-Static Dataflow Graphs with both initial phases and thresholds. A sufficient condition of liveness is first extended to the PCG model. The determination of initial conditions minimizing the total amount of initial data in the channels and ensuring liveness can then be expressed using Integer Linear Programming. This contribution and other improvements of previous works are incorporated in Turbine, a new dataflow graph generator. Its effectiveness is demonstrated experimentally by comparing it to two existing generators, DFTools and SDF3

    Automatic Parameter Tuning of Motion Planning Algorithms

    Get PDF
    Motion planning algorithms attempt to find a good compromise between planning time and quality of solution. Due to their heuristic nature, they are typically configured with several parameters. In this paper we demonstrate that, in many scenarios, the widely used default parameter values are not ideal. However, finding the best parameters to optimise some metric(s) is not trivial because the size of the parameter space can be large. We evaluate and compare the efficiency of four different methods (i.e. random sampling, AUC-Bandit, random forest, and bayesian optimisation) to tune the parameters of two motion planning algorithms, BKPIECE and RRT-connect. We present a table-top-reaching scenario where the seven degrees-of-freedom KUKA LWR robotic arm has to move from an initial to a goal pose in the presence of several objects in the environment. We show that the best methods for BKPIECE (AUC-Bandit) and RRT-Connect (random forest) improve the performance by 4.5x and 1.26x on average respectively. Then, we generate a set of random scenarios of increasing complexity, and we observe that optimal parameters found in simple environments perform well in more complex scenarios. Finally, we find that the time required to evaluate parameter configurations can be reduced by more than 2/3 with low error. Overall, our results demonstrate that for a variety of motion planning problems it is possible to find solutions that significantly improve the performance over default configurations while requiring very reasonable computation times
    • …
    corecore